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Abstract 

Color based re -identification requires a distance mea- 
sure to take a decision. In this paper we study the behavior 
of several histogram distance measures in different color 
spaces. We wonder whether there is a particular histogram 
distance measure better than others, likewise also, if there 
is a color space that present better discrimination features. 
Several experiments are designed and evaluated in several 
images to obtain measures against various color spaces. We 
test in several image databases. A measure ranking is gen- 
erated to calculate the area under the CMC, this area is 
the indicator used to evaluate which distance measure and 
color space present the best performance for the problem. 
Also, other parameters such as the image division in hor- 
izontal stripes and number of histogram bins, have been 
studied. 

Also, in the treatment of the images, the image is divided 
into severals stripes and various sizes of bins is used. 

Keywords: Re-identification, color histograms, distance 
measures 

1. Introduction 

Nowadays thanks to cheaper sensors and processors for 
video cameras, surveillance camera networks are widely 
present. These cameras can be useful in locating disap- 
peared persons, tracking thieves, incident detection, traffic 
control, etc. The scenarios can be very varied, but may 
be roughly divided into indoor and outdoor areas such as 
a hospital or highway respectively. These monitoring sys- 
tems accumulate a huge amount of information that must 
be processed. Human processing of the visual information 
gathered by the camera network may be unfeasible in cer- 
tain scenes, so intelligent systems arise for re-identification 
[25]. 

According to [14], ’’the concept of re-identification is de- 
fined as the fundamental task for a system of distributed 
cameras or not, with an association of people is through the 
images captured by these at a certain location and time”. 


In a re-identification problem we have the probe that is the 
identified individual who intends re-identify and the gallery 
that is the set of individuals where it will search. As shown 
in equation (1), the task of re-identificacion is formally de- 
fined, as follows: 

T = argminTiDifTi^Q),^ G r (1) 

where D{) is a similarity measure, that matches the probe 
Q with the n candidates from the gallery r = {}. 

There are different parameters that must be considered 
in the re-identification problem. To describe individuals, 
we must take into account the descriptors, that are calcu- 
lated for each individual. These descriptors can be based on 
color, shape, texture, position or biometric features. 

Two different set of characteristics are commonly ex- 
tracted from the images for this problem, part-based body 
models and features [22]. Part-based body models repre- 
sents how the body is divided to obtain the features in this 
areas. There are different models to extract information 
as may be Fixed Part Models, Adaptative Part Models and 
Learned Part Models. Fixed Part Models divide the indi- 
vidual in few horizontal and fixed stripes that represent the 
head, torso and legs [6, 16]. Other models use euristic meth- 
ods and classifiers to identify parts of the body [9, 12, 13]. 
On the other hand, the image is described by one or more 
descriptors that represent the features of the image, that 
could be global or local features. Global features are the 
characteristics that represent the image or a part of it, they 
could be color histograms, textures or edges [24, 23]. Local 
features are related to pixels in a small area of the image, 
the most extended methods are SIFT (Scale Invariant Fea- 
ture Transform), SURF (Speeded-Up Robust Features) or 
LBP (Local Binary Pattern) [18, 7, 20]. 

We focus on the feature set, specifically in descriptors 
based on color. There are multiple color spaces with their 
respective characteristics in relation to the luminance and 
chrominance. 

Color spaces that are often used in the field of computer 
vision are: 



RGB (Red Green Blue) is the typical color space found 
in many devices, it conform 3 channels (red, green 
and blue) where the luminance and chrominance are 
not found separately. It is not a perceptually uniform 
space. Distant colors are not perceived as such, and 
vice versa. 

HSV (Hue Saturation Value) is a color space that con- 
sists of three channels that characterize the hue (H), 
saturation (S) and value (V). The non linear transfor- 
mation from RGB to HSV appears in the equation (2). 
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red. The transformation from RGB to YCbCr appears 
in the equation (4). 
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In some problems, the image histogram provides enough 
information to describe an image. This method aims to ob- 
tain only color information of the image lacking of spatial 
information. A histogram represents the number of occur- 
rences of the pixel values in the image, it is defined formally 
in equation (5). A histogram is made up of a certain num- 
ber of bins. Each bin would be the intervals in which is 
divided the whole range of measurement values represent- 
ing the histogram. 

Hi W = Y % where ^ = { (j otherwise (5) 
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where: MAX : maximum value of RGB and MIN : 
minimum value of RGB. 

CIELAB was defined by CIE (Commission Interna- 
tionale de l’clairage), L (Lightness) and A and B for 
the color-opponent dimensions. It is a color space that 
aims to be a linear color space. The transformation 
from RGB to CIELAB appears in the equation (3). 

' X = 0.412453# + 0.357580G + 0.180423# 

Y = 0.212671 R + 0.715160G + 0.072169# 

Z = 0.019334# + 0.119193G + 0.950227# 


When working with histograms, a good practice is to 
perform a normalization, that is usually adopted following 
an approach to obtain a description similar to a probabil- 
ity function, see equation (6). This preprocessing task is 
performed to bring in a common plane any distribution re- 
gardless of the image size. There is a need to compare 
histograms to know how similar they are, as identified in 
equation (1) [10], for this reason, the litareture has defined 
different types of distance measures for histograms. 

H'(A) = JIn A \ where = 1 (6) 

l^i = 1 ,_i 

A distance d(x, y ) is defined in a space of dimension R" , 
d : M" x M" — )■ R, it must satisfy the following properties: 


L* = 116 f(Y/Y n ) - 16 

a* = 500 (f(X/X n ) - f(Y/Y n )) 

b* = 200 (f(Y/Y n ) - f(Z/Z n )) 


/(<?) 


A If <Z>(i ) 3 

| ( ip ) 2 Q + ^ Otherwise 


( 3 ) 

where: X n , Y n and Z n are the values of the white re- 
ference point defined by the CIE Illuminance standard. 


• d(x, y) > 0 

• d(x, x) = 0 

• d(x,y) = d(y,x ) 

If it also satisfy: 

• d(x, y) = 0 iff x = y 

• d(x, y) < d(x , k) + d(k , y) 


YCbCr is a color space that is made up by a lumi- 
nance component (Y) and two color components (Cb 
and Cr), that represent the chrominance in blue and 


It is said to be a metric distance. 

There are two groups of distance measures for his- 
tograms: bin to bin and cross-bin. The first group focuses 



on the comparison of the content of the bin with the corre- 
sponding bin of the second histogram, it does not exploit the 
information of neighborhood bins. The second group em- 
phasizes the neighborhood values of the bin that correspond 
to be treated. Some distance measures that are commonly 
used between histograms are the following: 

• Bin to bin measures: 

- Bhattacharyya [19] measures the similarity of 
two probability distributions. It has a computa- 
tional complexity O(n). 


should pay to transform a histogram into another. 
It has a computational complexity 0(n 3 log n). 
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- Mahalanobis [15] is a measure which is the dis- 
tance between a point and a distribution. It has a 
computational complexity O ( n ^ n ~ 1 ^ ) . 


- Chi Square [26] has statistical origin. It has a 
computational complexity O(n). 


D(x,y) 



y)g{x~y) 


(13) 


D{x,y) 


1 (xj ~ Vi) 2 

2 hi ( Xi + ^) 


( 8 ) 


- Correlation [4] is a measure that refers to a statis- 
tical relationship that involve dependence. It has 
a computational complexity O(n). 


D(x,y) 
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- Intersection [26] is a measure that comes from 
the intersection of the two histograms. This mea- 
sure has high performance. It has a computa- 
tional complexity O(n). 


n 

D(x, y) = min(xi, y t ) (10) 

i = 1 


- KL (Kullback-Leibler) [17] divergence is a mea- 
sure that has the origin in the area of information 
theory. This measure does not verify with the 
symmetry property. In the second histogram bins 
can not have zero value as this causes an uncer- 
tainty. It has a computational complexity O(n). 

n 
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• Cross-bin distances: 

- Earth mover’s distance (EMD) [26] is a measure 
which is defined by the minimum cost that we 


where: S = Covariance matrix 

2. Methodology 
2.1. Databases 
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48x128 
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48x128 
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Table 1. Comparative of databases that are used with number of 
individuals, resolution (pixels) and sample image. 


To perform the experiment, we consider several 
databases. Each database has its own characteristics, illu- 
mination conditions, capture sensor, noise present, etc. We 
use CAVIAR4REID [2], i-LIDS [1], VIPeR [3] and QMUL 
GRID [5]. Some of their charectaristics are shown in table 

2 . 1 . 

2.2. Division by stripes 

The division of the individual image in stripes [6] or sec- 
tions, see figure 1, can be a methodology to obtain good 
results. The results are improved because a distribution of 
local color is obtained, not being affected by noise. For in- 
stance, we could obtain the information of the head, torso 
and legs. Furthermore, smaller areas could be weighed with 
lower values and vice versa. However, the number of his- 
togram bins must be carefully configured because it could 
appear a lot of null bins in all stripes of the image. 



Figure 1 . Individual image sample stripe division. 


2.3. Experiment design 

It is interesting to know if there is any histogram distance 
measure that outperforms the others, to study the behavior 
of the measures we will consider different parameters that 
can affect them. These features are: 

• Distance measures: Bhattacharyya, Chi Square, Corre- 
lation, EMD, Intersection, Mahalanobis and Kullback- 
Leibler 

• Color spaces: RGB, HSV and CIELAB 

• Number of stripes of the image division: complete im- 
age, 5 stripes, 10 stripes and 25 stripes 

• Number of bins for the histogram: 16, 32, 64 and 128 

• Databases: CAVIAR4REID, i-LIDS, VIPeR and 

QMUL GRID 


2.4. Evaluation 

The area under CMC (Cumulative Match Curve) is used 
as performance indicator. CMC is widely used in re- 
identification as performance measure [8]. To calculate the 
CMC, we must rank in decreasing order of similarity the 
individuals of the gallery for each probe. Later, the position 
of each respective probe is accumulated. Finally each ele- 
ment is divided by the number of probes and the graphics 
are generated from theses values as the accumulation of the 
above elements. 

3. Results 

In order to eliminate possible indeterminations during 
the distances computation, and to share a similar represen- 
tation range to all of them. We have firstly modified Chi 
Square and KL (Kullback-Liebler) distances. For the first 
distance, it can be observed in the equation (8) that if the 
denominator is zero the result would be infinite, this situa- 
tion can occur when two bins are being processed with null 
values. Our approach to eliminate this uncertainty has been 
to discard these bins in the calculation. For the second mea- 
sure, looking at the corresponding Kullback-Liebler dis- 
tance equation (11), there can not be bins with zero because 
two situations may occur: the denominator is zero and we 
would get as infinite value or the numerator is zero and as 
a result would obtain Zn(0) that is equal to infinity. The 
solution is to discard from the analysis the pairs of bins that 
when one of them compared is equal to null. 

Depending on the distance, the higher similarity can cor- 
respond to the lower value or the higher value of them. 
Therefore, the equations (9, 10, 11) that correspond to Cor- 
relation, Intersection and KL distances respectively have 
been modified in other to get the lower the value, the higher 
the similarity. This comes from the need to share a sim- 
ilar representation range to all the distance measures. To 
solve these problems, we have changed the conditions of the 
equations Correlation and Intersection as shown in equation 
(14), in addition, to solve the KL distance has been added 
the following conditions, Equation (15). 

If distance = {Correlation or Intersection} 

distance f(x,y) = — distance(x,y ) (14) 

If distance = KL 

/ KL f {x,y) = KL(x,y) UKL(x,y)>0 
\KL f (x, y) = —KL(x, y) If KL(x, y) < 0 

As the graphical representation of the CMC is difficult 
to use to compare results, we have obtained the CMC area 
generated by the average number of bins, number of stripes 
and databases. In this way we visualize the data as shown 
in Figure 2, where the abscissa groups the results of CMC 
areas for each distance and color spaces. Those distances 



that provide better results are Bhattacharyya, Chi Square 
and Intersection. These three measures are bin to bin dis- 
tances, and agree with the same between color spaces. The 
color space with best performance is the HS V, circumstance 
that may be related be related with the separation in compo- 
nents, chrominance and luminance. 



Measures 


Figure 2. CMC area for distances and color space. 

To observe the effect of the different number of bins, 
we obtain the CMC area generated by the average of color 
space, number of stripes and space databases. Again we vi- 
sualize the results in Figure 3, where the abscissa groups 
the results of CMC areas for each distance with different 
number of bins. Distances that obtain good results are Bhat- 
tacharyya, Chi Square and Intersection. It could be empha- 
sized that the reduction of the histogram using a big bin 
size, provides similar results using smaller bins. It would 
be interesting to use larger bins because calculations may 
be computed faster, as the histogram has less number of 
bins. We could discuss the behavior that is perceived in the 
KL measure. KL measure gets worst results with higher 
number of bins, this is due to the approach we have used 
to resolve the uncertainties of KL. Increasing the number of 
bins will increase the probability that a color does not ap- 
pear in a stripe and will form holes in the histogram, these 
voids will be zero, which implies that they will not be pro- 
cessed when the calculation of the distance with the other 
color histogram, although other color histogram have values 
other than zero. So, reducing the number of bins, we lose 
information to be processed, a possible solution is to use as 
Jeffrey divergence [21]. In conclusion, the use of a small 
number of bins provides better results, but does not obtain 
a significant improvement. We propose an initial configura- 
tion to a problem to use 16 or 32 bins for the histogram, not 
to lose too much information. 

Observing the influence related to the number of stripes, 
we obtain the CMC area generated by the average of color 
space, number of bins and space databases. Again we vi- 
sualize the results in Figure 4, where the abscissa groups 
the results of CMC areas for each distance with different 
configurations of stripes. Distances that provide better re- 
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Figure 3. CMC area for distances and number of bins. 


suits are Bhattacharyya, Chi Square and Intersection. Di- 
viding the image into stripes significantly improves the re- 
sults, but comes to a point at which stripes excess makes 
the results worse. This is because the higher the number of 
stripes, the less pixels they have. Having a lot of stripes we 
will have many histograms with null bins values. On the 
other hand, making use of one histogram for the image, de- 
tailed information of the color distribution in certain areas 
of the image is lost. KL distance behaves in an abnormal 
manner compared with the rest of distances. This is be- 
cause as described before, important information has been 
discarded during distance computation. We propose to use 
as the initial configuration 5 or 10 stripes, as both config- 
uration exhibit best results and between them do not differ 
significantly. 
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Figure 4. CMC area for distances and number of stripes. 

It is interesting to check the processing time, we should 
have a commitment in relation to time and results at the 
time of re-identify. Figure 5 shows testing execution time 
measued in hours, where we refer to the number of bins and 
the number of stripes. These results refer to the total time 



to execute four databases with 3 color spaces, 7 distance 
measures and 16 settings for the number of bins and the 
number of stripes. Increasing the number of bins and the 
number of stripes affects clearly the runtime of the tests. 
It can be seen that there is a relationship with the number 
of stripes, where each bin increases approximately linearly. 
In contrast, the number of bins does not appreciate a clear 
relationship. 



Figure 5. Performance in hours for number of bins and number of 
stripes. 


4. Conclusion 

We have chosen to study the problem of re-identification 
based on appearance, using color histograms for image rep- 
resentation. We have evaluated different distance measures 
for comparing histograms for various color spaces with dif- 
ferent number of bins configurations and number of stripes 
in the image. 

We propose an initial configuration to solve the prob- 
lem of re-identification, this configuration does not assure 
to be the best configuration for a specific problem, but this 
configuration has a high probability of success. After per- 
forming and analysing the experiments, we have obtained 
the following conclusions. 

We propose Bhattacharyya, Chi Square and Intersection 
as a first approach to solving the problem, because these 
measures achieve good performance and very similar results 
for all the settings. HSV is the color space that reported 
better results by far. It can be due to the separation into 
luminance and chrominance. It is quite useful when we are 
dealing scenes with changing illumination conditions. 

To configure the number of bins in the histogram, we 
proposed to use as a starting value between 16 and 32 bins. 
This is because a large number of bins in the histogram 
generates noise because the image does not contain the full 
range of colors. Besides adding a computational cost when 
data are processed. Finally, we have chosen to divide the 
image between 5 and 10 stripes that are the configuration 
which performed best because complete image does not 


bring knowledge on specific areas. Instead, making exces- 
sive divisions generate noise in the histograms. 



Preference configuration 

Distance 

Color Space 
Number of bins 
Number of stripes 

Bhattacharyya, Chi Square and 
Intersection 
HSV 
16 and 32 
5 and 10 


Table 2. Proposed initial configuration. 


As future work, we plan to use more color spaces for per- 
form the tests. It would also be convenient to test a larger 
number of databases. Paper [11] exposes a database which 
is the agglomeration of multiple sets of images, where im- 
ages with multiple characteristics are included. Further- 
more, for the study would be favorable to use a larger num- 
ber of distance measures, as may be the divergence of Jef- 
frey to resolve the KL uncertainties. Finally, we could make 
a new experiment based on our proposed initial configura- 
tion and using different image sizes. This is due to the high 
computational cost of repeating all experiments from the 
beginning. 
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